AITopics | safety issue

Collaborating Authors

safety issue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Only Thing Standing Between Humanity and AI Apocalypse Is … Claude?

WIREDFeb-6-2026, 16:33:18 GMT

The Only Thing Standing Between Humanity and AI Apocalypse Is Claude? As AI systems grow more powerful, Anthropic's resident philosopher says the startup is betting Claude itself can learn the wisdom needed to avoid disaster. Anthropic is locked in a paradox: Among the top AI companies, it's the most obsessed with safety and leads the pack in researching how models can go wrong. But even though the safety issues it has identified are far from resolved, Anthropic is pushing just as aggressively as its rivals toward the next, potentially more dangerous, level of artificial intelligence. Its core mission is figuring out how to resolve that contradiction. Last month, Anthropic released two documents that both acknowledged the risks associated with the path it's on and hinted at a route it could take to escape the paradox.

claude, large language model, machine learning, (18 more...)

WIRED

Country: North America > United States > California (0.14)

Industry:

Information Technology (0.69)
Media > News (0.48)
Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.70)
Information Technology > Communications > Social Media (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Health Department Will Mine Unverified Vaccine Injury Claims With New AI Tool

Mother JonesFeb-5-2026, 12:30:00 GMT

Experts worry it will be used to further Robert F. Kennedy Jr.'s anti-vaccine agenda. Get your news from a source that's not owned and controlled by oligarchs. The US Department of Health and Human Services (HHS) is developing a generative artificial intelligence tool to find patterns across data reported to a national vaccine monitoring database and to generate hypotheses on the negative effects of vaccines, according to an inventory released last week of all use cases the agency had for AI in 2025. The tool has not yet been deployed, according to the HHS document, and an AI inventory report from the previous year shows that it has been in development since late 2023. But experts worry that the predictions it generates could be used by HHS secretary Robert F. Kennedy Jr. to further his anti-vaccine agenda.

artificial intelligence, natural language, social media, (13 more...)

Mother Jones

Country: North America > United States (1.00)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

HHS Is Making an AI Tool to Create Hypotheses About Vaccine Injury Claims

WIREDFeb-4-2026, 10:00:00 GMT

Experts worry Robert F. Kennedy Jr.'s Health Department will use an internal AI tool to analyze vaccine injury claims in a way that furthers his anti-vaccine agenda. The US Department of Health and Human Services is developing a generative artificial intelligence tool to find patterns across data reported to a national vaccine monitoring database and to generate hypotheses on the negative effects of vaccines, according to an inventory released last week of all use cases the agency had for AI in 2025. The tool has not yet been deployed, according to the HHS document, and an AI inventory report from the previous year shows that it has been in development since late 2023. But experts worry that the predictions it generates could be used by Health and Human Services secretary Robert F. Kennedy Jr. to further his anti-vaccine agenda. A long-standing vaccine critic, Kenedy has upended the childhood vaccination schedule in his year in office, removing several shots from a list of recommended immunizations for all children, including those for Covid-19, influenza, hepatitis A and B, meningococcal disease, rotavirus, and respiratory syncytial virus, or RSV.

large language model, machine learning, natural language, (21 more...)

WIRED

Country: North America > United States (1.00)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)

Add feedback

Large language models provide unsafe answers to patient-posed medical questions

Draelos, Rachel L., Afreen, Samina, Blasko, Barbara, Brazile, Tiffany L., Chase, Natasha, Desai, Dimple Patel, Evert, Jessica, Gardner, Heather L., Herrmann, Lauren, House, Aswathy Vaikom, Kass, Stephanie, Kavan, Marianne, Khemani, Kirshma, Koire, Amanda, McDonald, Lauren M., Rabeeah, Zahraa, Shah, Amy

arXiv.org Artificial IntelligenceAug-6-2025

Millions of patients are already using large language model (LLM) chatbots for medical advice on a regular basis, raising patient safety concerns. This physician-led red-teaming study compares the safety of four publicly available chatbots--Claude by Anthropic, Gemini by Google, GPT-4o by OpenAI, and Llama3-70B by Meta--on a new dataset, HealthAdvice, using an evaluation framework that enables quantitative and qualitative analysis. In total, 888 chatbot responses are evaluated for 222 patient-posed advice-seeking medical questions on primary care topics spanning internal medicine, women's health, and pediatrics. We find statistically significant differences between chatbots. The rate of problematic responses varies from 21.6 percent (Claude) to 43.2 percent (Llama), with unsafe responses varying from 5 percent (Claude) to 13 percent (GPT-4o, Llama). Qualitative results reveal chatbot responses with the potential to lead to serious patient harm. This study suggests that millions of patients could be receiving unsafe medical advice from publicly available chatbots, and further work is needed to improve the clinical safety of these powerful tools.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.18905

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems

Fang, Junfeng, Yao, Zijun, Wang, Ruipeng, Ma, Haokai, Wang, Xiang, Chua, Tat-Seng

arXiv.org Artificial IntelligenceJun-17-2025

The development of large language models (LLMs) has entered in a experience-driven era, flagged by the emergence of environment feedback-driven learning via reinforcement learning and tool-using agents. This encourages the emergenece of model context protocol (MCP), which defines the standard on how should a LLM interact with external services, such as \api and data. However, as MCP becomes the de facto standard for LLM agent systems, it also introduces new safety risks. In particular, MCP introduces third-party services, which are not controlled by the LLM developers, into the agent systems. These third-party MCP services provider are potentially malicious and have the economic incentives to exploit vulnerabilities and sabotage user-agent interactions. In this position paper, we advocate the research community in LLM safety to pay close attention to the new safety risks issues introduced by MCP, and develop new techniques to build safe MCP-powered agent systems. To establish our position, we argue with three key parts. (1) We first construct \framework, a controlled framework to examine safety issues in MCP-powered agent systems. (2) We then conduct a series of pilot experiments to demonstrate the safety risks in MCP-powered agent systems is a real threat and its defense is not trivial. (3) Finally, we give our outlook by showing a roadmap to build safe MCP-powered agent systems. In particular, we would call for researchers to persue the following research directions: red teaming, MCP safe LLM development, MCP safety evaluation, MCP safety data accumulation, MCP service safeguard, and MCP safe ecosystem construction. We hope this position paper can raise the awareness of the research community in MCP safety and encourage more researchers to join this important research direction. Our code is available at https://github.com/littlelittlenine/SafeMCP.git.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.13666

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

LongSafety: Evaluating Long-Context Safety of Large Language Models

Lu, Yida, Cheng, Jiale, Zhang, Zhexin, Cui, Shiyao, Wang, Cunxiang, Gu, Xiaotao, Dong, Yuxiao, Tang, Jie, Wang, Hongning, Huang, Minlie

arXiv.org Artificial IntelligenceFeb-24-2025

As Large Language Models (LLMs) continue to advance in understanding and generating long sequences, new safety concerns have been introduced through the long context. However, the safety of LLMs in long-context tasks remains under-explored, leaving a significant gap in both evaluation and improvement of their safety. To address this, we introduce LongSafety, the first comprehensive benchmark specifically designed to evaluate LLM safety in open-ended long-context tasks. LongSafety encompasses 7 categories of safety issues and 6 user-oriented long-context tasks, with a total of 1,543 test cases, averaging 5,424 words per context. Our evaluation towards 16 representative LLMs reveals significant safety vulnerabilities, with most models achieving safety rates below 55%. Our findings also indicate that strong safety performance in short-context scenarios does not necessarily correlate with safety in long-context tasks, emphasizing the unique challenges and urgency of improving long-context safety. Moreover, through extensive analysis, we identify challenging safety issues and task types for long-context models. Furthermore, we find that relevant context and extended input sequences can exacerbate safety risks in long-context scenarios, highlighting the critical need for ongoing attention to long-context safety challenges. Our code and data are available at https://github.com/thu-coai/LongSafety.

instruction, long-context task, safety issue, (14 more...)

arXiv.org Artificial Intelligence

2502.16971

Country:

Europe > Austria > Vienna (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (0.93)
Law > Criminal Law (0.68)
Law Enforcement & Public Safety (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs

Liu, Zhihao, Hu, Chenhui

arXiv.org Artificial IntelligenceOct-28-2024

As large language models (LLMs) rapidly evolve, they bring significant conveniences to our work and daily lives, but also introduce considerable safety risks. These models can generate texts with social biases or unethical content, and under specific adversarial instructions, may even incite illegal activities. Therefore, rigorous safety assessments of LLMs are crucial. In this work, we introduce a safety assessment benchmark, CFSafety, which integrates 5 classic safety scenarios and 5 types of instruction attacks, totaling 10 categories of safety questions, to form a test set with 25k prompts. This test set was used to evaluate the natural language generation (NLG) capabilities of LLMs, employing a combination of simple moral judgment and a 1-5 safety rating scale for scoring. Using this benchmark, we tested eight popular LLMs, including the GPT series. The results indicate that while GPT-4 demonstrated superior safety performance, the safety effectiveness of LLMs, including this model, still requires improvement. The data and code associated with this study are available on GitHub.

computational linguistic, language model, llm, (12 more...)

arXiv.org Artificial Intelligence

2410.21695

Country:

Asia > Singapore (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

Zhang, Hengxiang, Gao, Hongfu, Hu, Qiang, Chen, Guanhua, Yang, Lili, Jing, Bingyi, Wei, Hongxin, Wang, Bing, Bai, Haifeng, Yang, Lei

arXiv.org Artificial IntelligenceOct-24-2024

With the rapid development of Large language models (LLMs), understanding the capabilities of LLMs in identifying unsafe content has become increasingly important. While previous works have introduced several benchmarks to evaluate the safety risk of LLMs, the community still has a limited understanding of current LLMs' capability to recognize illegal and unsafe content in Chinese contexts. In this work, we present a Chinese safety benchmark (ChineseSafe) to facilitate research on the content safety of large language models. To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography, and variant/homophonic words. Moreover, we employ two methods to evaluate the legal risks of popular LLMs, including open-sourced models and APIs. The results reveal that many LLMs exhibit vulnerability to certain types of safety issues, leading to legal risks in China. Our work provides a guideline for developers and researchers to facilitate the safety of LLMs.

category, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.18491

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

Wang, Fei, Mehrabi, Ninareh, Goyal, Palash, Gupta, Rahul, Chang, Kai-Wei, Galstyan, Aram

arXiv.org Artificial IntelligenceOct-7-2024

Data is a crucial element in large language model (LLM) alignment. Recent studies have explored using LLMs for efficient data collection. However, LLM-generated data often suffers from quality issues, with underrepresented or absent aspects and low-quality datapoints. To address these problems, we propose Data Advisor, an enhanced LLM-based method for generating data that takes into account the characteristics of the desired dataset. Starting from a set of pre-defined principles in hand, Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation accordingly. Data Advisor can be easily integrated into existing data generation methods to enhance data quality and coverage. Experiments on safety alignment of three representative LLMs (i.e., Mistral, Llama2, and Falcon) demonstrate the effectiveness of Data Advisor in enhancing model safety against various fine-grained safety issues without sacrificing model utility.

ata, dataset, dvisor, (16 more...)

arXiv.org Artificial Intelligence

2410.05269

Country:

North America > United States > Virginia (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Safety challenges of AI in medicine

Wang, Xiaoye, Zhang, Nicole Xi, He, Hongyu, Nguyen, Trang, Yu, Kun-Hsing, Deng, Hao, Brandt, Cynthia, Bitterman, Danielle S., Pan, Ling, Cheng, Ching-Yu, Zou, James, Liu, Dianbo

arXiv.org Artificial IntelligenceSep-11-2024

Recent advancements in artificial intelligence (AI), particularly in deep learning and large language models (LLMs), have accelerated their integration into medicine. However, these developments have also raised public concerns about the safe application of AI. In healthcare, these concerns are especially pertinent, as the ethical and secure deployment of AI is crucial for protecting patient health and privacy. This review examines potential risks in AI practices that may compromise safety in medicine, including reduced performance across diverse populations, inconsistent operational stability, the need for high-quality data for effective model tuning, and the risk of data breaches during model development and deployment. For medical practitioners, patients, and researchers, LLMs provide a convenient way to interact with AI and data through language. However, their emergence has also amplified safety concerns, particularly due to issues like hallucination. Second part of this article explores safety issues specific to LLMs in medical contexts, including limitations in processing complex logic, challenges in aligning AI objectives with human values, the illusion of understanding, and concerns about diversity. Thoughtful development of safe AI could accelerate its adoption in real-world medical settings.

language model, llm, medicine, (14 more...)

arXiv.org Artificial Intelligence

2409.18968

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
(8 more...)

Genre: Research Report > Experimental Study (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)

Add feedback